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Abstract 

Data from student ratings of instruction (SRI) are used ubiquitously as a key element in providing 
instructors with valuable feedback and evaluators with critical student input. Nonetheless, calls for 
the elimination of SRI continue to appear in academic journals and higher education periodicals. This 
paper brings to bear the huge body of research on SRI to which so many academics and institutions 
have contributed. Some of the most egregiously erroneous statements about SRI are rebutted with 
brief reviews of the readily available compelling evidence. Although some faculty frustrations about 
misuse of SRI are valid, we argue that inclusion of student voice is critical. Students can provide 
useful feedback because they have firsthand experience over multiple occasions of what actually 
occurred in the classroom. Recommendations are made for best practices in using SRI as one of 
many sources for improving and evaluating teaching. 


The fact that an opinion has been widely held is no evidence whatever that it is not 
utterly absurd; indeed, ... a widely spread belief is more likely to be foolish than sensible. 

— Bertrand Russell, Marriage and Morals 


Institutions of higher education believe that improving 
teacher quality and effectiveness are priorities. These 
institutions know that perception of the quality of teaching 
is a principal reason students give for selecting a college 
or university (Shah, Nair, & Bennett, 2013), and institutions 
spend thousands of hours of faculty time and considerable 
financial resources to measure and improve student 
achievement (Cooper & Terrell, 2013). When making tenure 
decisions departments recognize faculty research in 
pedagogy and evaluation as legitimate academic discourse, 
as reflected in the many books focused on teaching and 
learning that appear every year. For example, "in the past 
10 years (2015 to 2005) Wiley published 646 books in 
its Higher and Adult Education Division; . . . of those, 175 
were specifically in the Teaching, Learning, and Curriculum 
category” (C. Allard, personal communication, January 8, 
2016). Academics, departments, and colleges accept, then, 
that teacher evaluation is integral to their missions. 

Data from student ratings of instruction (SRI) (aka "student 
evaluations of teaching” or "course evaluations”) are 
used ubiquitously as a key element in measuring teacher 
effectiveness. Indeed, student voice has been argued as 
essential to positive change in the classroom, because 
students can provide critical information on the improvement 
of teaching and learning (Quaglia & Corso, 2014). SRI are 
often part of decisions about merit salary, tenure, promotion, 


and helping faculty improve courses and instruction. Because 
of their widespread use— and, at times, misuse— SRI have 
understandably undergone extensive scrutiny with upwards of 
over 3,000 publications devoted to them (Benton & Cashin, 
2014). The topic has been studied extensively, perhaps 
more than any other in higher education. Why, then, do some 
institutions get teaching evaluation so wrong and so many 
misconceptions and misunderstandings persist about the 
validity and reliability of SRI? 

Claims of bias (non-instructional factors influencing 
evaluations) continue to appear in academic journals 
(e.g., Boring, Ottoboni, & Stark, 2016; MacNell, Driscoll, & 
Hunt, 2014; Stark & Freishtat, 2014) and higher education 
periodicals (e.g., Asher, 2013; Berrett, 2015; Flaherty, 

2016; Mulhere, 2014; Wieman, 2015; Zimmerman, 2014). 
Such claims frustrate and disappoint those familiar with the 
vast research literature providing empirical evidence that 
supports SRI utility (validity) and consistency (reliability) 

(see Benton & Cashin, 2011, 2014; Hativa, 2013b; Marsh, 
2007; Theall & Franklin, 2001, for reviews). Consequently, 
one purpose here is to address commonly held misbeliefs 
about SRI by bringing to bear the huge body of research 
and data to which so many academics and institutions 
have contributed. We begin by citing papers containing 
some of the most egregiously erroneous statements about 
SRI and rebutting those statements with brief reviews of 




the readily available compelling evidence. Next, given how 
frequently administrators and promotion/tenure committees 
misuse SRI and the deleterious consequences of doing 
so, we acknowledge and address faculty complaints about 
SRI. Then, we recommend ways SRI can be used to inform 
teachers and identify areas needing improvement within an 
institution. We end with a brief review of what the research 
indicates are current best practices in using SRI effectively in 
evaluating teaching. 

Flagrantly False Claims About SRI 

Bad Teachers Get Better Evaluations 

Among the more egregious claims is that “professors who 
receive high evaluations are worse teachers than their peers" 
(Zimmerman, 2014). In a similar vein, Stark and Freishtat 
(2014) voiced, “Good teachers can get bad evaluations." 

Both of these statements miss the point— SRI are designed 
to measure teaching effectiveness in a given course, not 
teaching effectiveness in general. Certainly a usually effective 
instructor can get a bad evaluation in a given course. The real 
question should be how do the ratings look across multiple 
courses? Research has established that at least six to eight 
class ratings should be collected before reliable, summative 
decisions about an individual’s teaching effectiveness can be 
made (Benton, Li, Brown, Guo, & Sullivan, 2015). So, yes, a 
good teacher can get bad evaluations on occasion, often due 
to personal or environmental factors influencing performance. 
But how does that mean that SRI are unreliable and should 
not be trusted? A good teacher who gets one poor evaluation 
across eight different classes is obviously a good teacher. 
Likewise, a teacher who gets poor evaluations across eight 
classes with only one good evaluation is, logically, in need of 
help and improvement. The odd poor evaluation for a good 
teacher is analogous to the coldest day argument by climate 
change deniers; an outlying single experience should not be 
compelling evidence. 

If the incidents of good teachers getting bad evaluations 
were widespread, we would not expect to find the consistent 
positive relationships between SRI and other indicators 
of teaching effectiveness. In point of fact, SRI correlate 
positively with: 

1. students’ actual achievement in the course as measured 
by exam performance (Beleche, Fairris, & Marks, 2012; 
Benton, Duchon, & Pallett, 2011), 

2. instructor self-ratings (Feldman, 1989a; Marsh, Overall, 

& Kesler, 1979; Marsh & Dunkin, 1997), and 

3. ratings by colleagues, administrators (Feldman, 1989a), 
and trained observers (Feldman, 1989a; Marsh & 

Dunkin, 1997; Murray, 1983). 

So, the comments claiming good teachers do not consistently 
receive the highest SRI are simply not true and not 
supportable based on research. Ratings from teachers 
themselves, their administrators, colleagues, alumni, and 
trained observers confirm the validity of SRI. 


“Tough” Demanding Teachers Receive Lower SRI 

This old saw has been around since the first days of SRI, and, 
unfortunately, gets repeated yet today along with its corollary 
that “easier" teachers get higher SRI. Or, as Lyell Asher 
(2013) asserted in an article published in the Wail Street 
Journal, “easing up on demands and raising grades will get 
you better reviews [student ratings] at the end” (para. 4). This 
assumption that students are out for the easy “A" is insulting 
to the vast numbers of students who are working hard to gain 
an education and is perhaps indicative of a general attitude 
toward students from those arguing vehemently against SRI 
use. 

The cynical assumptions underpinning these types of 
assertions are that students are interested only in grades 
and do not want to be challenged in their educations. But in 
a study involving over 50,000 classes across eight academic 
disciplines, Centra (2003) found that the grade students 
expected to earn was only weakly related to SRI. Others have 
similarly reported low, positive correlations (Braskamp & Ory, 
1994; Centra, 2003; Feldman, 1976; Floward & Maxwell, 
1980, 1982; Marsh & Dunkin, 1997; Marsh & Roche, 2000). 
Even this low positive correlation between grades and 
ratings may not necessarily indicate instructors are lowering 
standards to get higher ratings. It could well indicate that 
students who learn more earn higher grades and assign 
higher ratings, which supports the validity of SRI. A third 
possibility is that student characteristics, such as motivation 
and interest in the subject matter, could lead to greater 
learning and, therefore, higher grades and student ratings 
(McKeachie, 1997). 

Evidence shows that Asher’s assertion about grade leniency 
is not only wrongheaded but perhaps actually the inverse 
of the truth. If teachers really want to improve course 
ratings, they would do well to practice other more productive 
behaviors than assigning lenient grades. Challenging 
students, stimulating their interests (Marsh & Roche, 

2000), and making appropriate changes to instruction and 
the course based on student feedback (Centra, 2003) are 
more likely than leniency to lead to higher SRI and greater 
student learning. Moreover, research conducted in nearly 
500,000 classes across more than 300 institutions revealed 
that instructors are more likely to earn high SRI when their 
students say their teacher challenged them and had high 
achievement standards (Benton, Guo, Li, & Gross, 2013). Let 
us emphasize: a half million classes and 300 institutions. It’s 
time for SRI deniers to acknowledge that the research, the 
science, is simply not on their side. 

Students Are Not Qualified to Judge Teaching 
Effectiveness 

Arguing that the worst or most lenient teachers receive the 
highest ratings rests upon another fraudulent assumption: 
students are not qualified to judge teaching effectiveness. 
Understandably, we may be put off when the evaluators of 
our work are less educated than we are. Flow can we trust 
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undergraduate students to render valid judgments about 
our teaching effectiveness when most of them have never 
taught? Perhaps an analogy from another profession may 
be of use in answering this question. Typically, hospital 
administrators are expected to evaluate the effectiveness 
of physicians, a practice many who have had experience 
with doctor visits would support, as a patient’s voice should 
matter in decisions about improvement of care. Logically, 
one important factor in physician evaluation would be 
patient ratings of experiences with their physician that 
include perceptions of progress in recovery, the physician’s 
interpersonal skills, quality of care, and so forth (Manary, 
Boulding, Staelin, & Glickman, 2013). Patients are not 
doctors, but ignoring the input of patients about their doctors 
would be foolish. Students are not professors, but the same 
logic applies— ignoring student input is foolish. If patient 
input were the only source of evidence in making decisions 
about which doctors were effective, there would justifiably be 
concern. On the other hand, if we decided to ignore patient 
perceptions, we would lose out on some valuable information 
about how to improve medical services just as we lose 
valuable information about improving teaching if we ignore 
student feedback. Yet the old chestnut about students’ being 
incapable of providing useful feedback still finds its way into 
print even today. 

The time and resources devoted to debating whether 
students are capable could be better spent constructing 
the questions students are asked to answer. Students know 
how to assess if the right questions are asked. In fact, in a 
review of 31 studies Feldman (1989a) found that student 
views of what constitutes effective teaching are very similar 
to those of faculty (average correlation = .71). In the realm 
of patient health-care assessment, where some physicians 
and academics argue patient feedback is not credible 
because patients lack formal medical education, Manary and 
colleagues (Manary et al., 2013, para. 1) get it right: “when 
designed and administered appropriately, patient-experience 
surveys provide robust measures of quality.” The same can 
be said for SRI. 

SRI Are Unreliable 

Reliability refers to consistency, and well-constructed SRI 
have a great deal of it (see review by Benton & Cashin, 

2014). Actually, SRI within the same class tend to be highly 
consistent in students’ own ratings, in ratings over students 
within the same class, and in ratings of the same instructor 
across multiple courses. 

Perhaps the unreliability assertion is getting passed around 
again for two reasons. First, many home-grown ratings 
systems are poorly designed. Just as a camel is a horse 
designed by a committee, so goes the plight of many SRI 
surveys faculty committees create (Hativa, 2013a). Another 
possible explanation is the increase in web-based SRI, which 
typically have lower response rates than those administered 
on paper. The thinking may be that ratings based on low 


student response rates cannot be trusted. Certainly, ratings 
based on lower response rates cannot be assumed to 
represent the overall class perceptions as well as higher 
response rates. But, representativeness is a different issue 
than reliability. The former is tied to the percent of students 
in the class that respond— the greater the response rate, the 
more representative are the scores derived from the course 
rating. Reliability, on the other hand, is related to sample size, 
or the number of student raters. If 50 students out of a class 
of 100 responded to a survey, their ratings would be more 
statistically reliable than if 19 students out of a class of 20 
responded even though the 19 responders would be more 
representative. 

So even though the reliability of any measure does increase 
as the number of observations increase, it does not follow 
that a low number of observations means those observations 
are not reliable; even classes with low response rates can 
provide useful information for a teacher. Perhaps an analogy 
is helpful. If one person tells us we have a tail, we can write 
it off as ridiculous. If a second person says, “Hey, you have 
a tail,” they have our attention, but we can still think, “These 
people are clearly mistaken.” But, if a third person on a 
different occasion pulls us aside and says, “ H ey, did you 
know you have a tail?” we still might not necessarily conclude 
we have a tail, but we should certainly be concerned that 
folks are telling us that we do. The same goes for ratings, 
even those with low response rates. The first time we get 
“bad” ratings from a course with a low response rate, we can 
perhaps pay little attention to it. Perhaps it is an instance 
mentioned before of a good teacher getting an occasional 
poor rating. But, across time if the ratings are consistently 
low, even from classes in which few actually participate in 
the rating, we need to respond, gather additional information 
(e.g., from peers), and see what we can do to improve the 
situation. 

Because well-constructed SRI present multiple information 
from individuals (students) within a class and are collected 
across multiple occasions, one can make the case that 
students provide the most reliable source of feedback about 
teaching (Marsh, 2007). In contrast, class observations 
performed by an administrator or a peer— be they trained 
or untrained evaluators— typically represent only one 
observation on one occasion. In this case we do not know 
what the consistency/reliability is of their ratings. For 
reliability, trust SRI. 

Personal Factors Unrelated to Learning Influence 
Ratings 

Another favorite claim put forth yet unsupported by 
research is that overall ratings are strongly influenced by 
instructor characteristics unrelated to student learning, 
such as instructor gender, ethnicity, and personality (Stark 
& Freishtat, 2014). For example, the popular press gave 
unjustifiable attention to two studies that claimed gender 
bias in SRI (Boring et al., 2016; MacNell et al., 2014), which 
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were refuted in an editorial note (Ryalls, Benton, Barr, and Li, 
2016). In studies of SRI collected in actual classes, gender 
is only weakly related to ratings (see literature reviews by 
Benton & Cashin, 2011, 2014). Further, Li, Benton, and Ryalls 
(in press) analyzed data collected from IDEA SRI in over 
15,000 classes taught by female instructors and over 12,000 
taught by male instructors. Multiple institutions, Carnegie 
classifications, and disciplines were represented. The authors 
found no differences in overall ratings of teaching, the 
course, and average student progress on relevant learning 
objectives. 

When differences between male and female instructors 
have been found, they have typically occurred in research 
laboratory studies (where students rated descriptions of 
fictitious teachers who varied in gender). However, in studies 
of ratings of actual teachers there is only a very weak 
relationship that favors female instructors (Centra, 2009; 
Feldman, 1993). In reviewing several studies, Feldman 
(1992) concluded, “Any predispositions of students in the 
social laboratory to view male and female college teachers 
in certain ways (or the lack of such predispositions) may 
be modified by students’ actual experiences with their 
teachers in the classroom or lecture hall” (Feldman, 1992, 
p. 152). Feldman’s point corresponds to Gordon Allport’s 
(1954) contact theory, which suggests stereotypes can be 
overridden by actual personal interaction, a hypothesis well- 
supported according to Pettigrew & Tropp’s (2006) meta- 
analysis of 515 studies. Even electronic contact (both text- 
based and video-based online interactions) has been shown 
to reduce prejudice (Amichai-Hamburger & McKenna, 2006). 

With respect to race/ethnicity, very few studies of actual 
ratings of instructors in the classroom have been conducted 
(for reviews, see Benton & Cashin, 2014; Centra, 1993; 
Huston, 2005). Some studies have randomly assigned 
students to rate a fictitious instructor on various qualities 
based on changes in the instructor’s name, which implied 
a certain gender and ethnicity (Smith & Anderson, 2005; 
Anderson & Smith, 2005). Others were based on a computer- 
animated professor who varied in gender and race (Basow, 
Codos, & Martin, 2013). 

Conflicting conclusions were reported in two different studies 
conducted on local ratings at single institutions. In one, 

Black instructors received significantly lower scores than 
White and “Other” faculty on overall ratings of the course 
and teaching effectiveness (Smith, 2007). In the other study, 
no support was found for the initial hypothesis that students 
would rate minority faculty lower than majority faculty (Ludwig 
& Meacham, 1997). Clearly, given the limited number of 
studies and the conflicting outcomes, more research is 
needed. Racial bias cuts across all facets of society, and 
administrators and faculty are professionally responsible for 
examining the extent to which it exists at their institution. One 
of their most useful sources of information can be analyses 
of SRI in their local setting. When combined with other 


indicators of teaching effectiveness, SRI can be a great help 
in decision-making. 

The conclusion from studies of instructor personality is that 
it has little impact on ratings (Braskamp & Ory, 1994; Centra, 
1993). The few personality traits related to SRI, such as 
positive self-esteem, energy/enthusiasm, and orderliness, 
tend to enhance teaching effectiveness and are, therefore, 
not considered biases. Displaying energy and enthusiasm 
could stimulate student interest, which is positively correlated 
with multiple learning outcomes. Being orderly creates 
classroom structure, which is associated with greater student 
learning of cognitive outcomes (Benton et al., 2015). It is, 
therefore, not so much the personality of the instructor that 
matters as much as the personal characteristics manifested 
in the classroom. Most of the relationship between instructor 
personality and SRI is most likely connected to the behaviors 
the instructor displays in the classroom. Or as Braskamp and 
Ory (1994, p. 180) concluded, the influence of personality 
“may be caused more by what [instructors] do in their 
teaching than by who they are.” 

To say that teacher gender, race, and personality do not exert 
great influence on SRI is not to deny that bias does exist 
for some students in a class. Of course bias exists to some 
degree in student feedback, as course ratings are surveys 
designed and filled out by humans. But, bias in student 
ratings due to these instructor variables is not large and 
should not greatly affect teaching evaluations. Moreover, 
faculty need to understand and be comfortable that SRI are 
robust against other potential biases. For example, SRI are 
not strongly related to instructor age and teaching experience 
(Marsh & Hocevar, 1991). They are also robust against 
potential biases brought on by some student characteristics. 
Student gender is not highly correlated with ratings, although 
student-gender-by-instructor-gender interactions have been 
reported. In a study involving a large number of two- and 
four-year institutions across a variety of academic disciplines, 
Centra and Gaubatz (2000) found female students gave 
slightly higher ratings to female instructors. However, the 
authors did not consider the differences large enough to 
affect personnel decisions. Other student characteristics, 
including year in school (Davis, 2009; McKeachie, 1997), 
grade-point average (Braskamp & Ory, 1994; Centra, 1993; 
Marsh & Dunkin, 1997; Marsh & Roche, 2000; McKeachie, 
1997), and personality (Abrami, Perry, & Leventhal, 1982) 
have little or no relationship to SRI. Likewise, SRI are not 
meaningfully affected by the time of day the course is offered 
(Aleamoni, 1981; Feldman, 1979) and by whether they are 
administered online versus on paper (Benton, Webster, 

Gross, and Pallett, 2010a) or in online versus face-to-face 
courses (Benton, Webster, Gross, and Pallett, 2010b). 

To point out bias in a rating given by a human and use it to 
negate the usefulness of that rating makes no sense. If the 
efficacy of student feedback is to be ignored because of 
bias, then one must also throw out peer and administrator 
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feedback, as well as Promotion and Tenure Committees, 
annual reviews, reference letters, instructor self-reflections, 
and anything else that has a human element to it. Grades 
given by instructors would also be useless, as instructors are 
human and therefore full of bias. The question then is not “Is 
there bias in this tool?” but “Can we find usefulness in these 
data in spite of the bias inherent in humans?” The answer to 
this question is yes, provided the survey instrument is well 
designed. 

Students Tend to be Motivated More by Anger About 
a Low Grade than Satisfaction 

This claim assumes, when ratings are not mandatory, that 
students who are upset about their expected grade are more 
likely to complete the ratings form than students who are 
doing well in the course. However, the relationships between 
response rates and overall ratings of the teacher and course 
tend to be quite low in surveys administered online and on 
paper (Benton et al., 2010b) and in classes taught online 
and face-to-face (Benton et al., 2010a). The assumption 
that anger or revenge will cause more students to respond 
than other motivations seems unfounded. High achieving 
students are more likely than others to respond to an SRI 
survey (Avery, Bryant, Mathios, Kang, & Bell, 2006; Porter 
& Umbach, 2006; Porter & Whitcomb, 2005). Nonresponse 
to online SRI surveys is more common among grade D and 
F students compared to grades of A, B, and C (Adams & 
Umbach, 2012). Why would high-achieving students be more 
likely to respond? The answer may be that high-achieving 
students have more positive feelings about the course or 
the institution. In fact, student satisfaction with college 
is positively correlated with grade-point average (Kuh & 

Hu, 2001). High achievers are, therefore, more likely than 
others to respond to an SRI survey not out of anger but out 
of satisfaction. Also, it makes intuitive sense that D and F 
students are less likely to have done the work during the 
semester and are therefore less likely to do the ‘work’ of 
providing feedback at semester’s end. If they checked out 
during the semester, the chances of checking back in to rate 
the instructor is probably small. 

Millennial Students Are More Punishing in Their 
Ratings 

Every generation of teachers has probably uttered something 
like, “The students of today are not like the ones I taught 
years ago.” So, no surprise, the Millennial generation is 
getting its turn to be criticized. Members of the Millennial 
generation, ranging in birth dates from the early 1980s to the 
early 2000s, are believed to share common characteristics, 
one of which is a sense of entitlement (Nilson, 2013; Twenge, 
2006). Nilson (2013) has argued that Millennial students 
feel entitled to receive high grades without putting out 
much effort. Accordingly, some teachers fear that Millennial 
students will be even more likely than prior generations 
to assign low ratings to instructors who give lower than 
expected grades. In point of fact, average overall ratings 
of the instructor and course have increased steadily since 


2002 (Benton et al., 2015), which refutes the notion that 
Millennial tend to be more “punishing” in their ratings. 

Some might argue that the gradual inflation in SRI is evidence 
that instructors are “dumbing down” the curriculum to get 
high ratings. If that is the case, the fault lies not in the 
instruments used to assess teaching effectiveness but the 
system that over-emphasizes them and the faculty who 
lower their standards. But lowering standards is misguided 
because, as mentioned previously, ratings of teaching and 
the course are higher when students report the instructor had 
high achievement standards and expected students to share 
responsibility for their own learning (Benton et al., 2013). In 
addition, ratings of teaching effectiveness are higher when 
instructors encourage students to think for themselves (Zhao 
& Gallant, 2012). 

But, why has there been a steady increase in average 
ratings since 2002? If one is to believe, as has been argued, 
that SRI are valid and reliable, what accounts for the fact 
that Millennial are rating teachers higher than previous 
generations? One explanation is that students of today are 
simply more generous. But then if that were the case a high 
positive correlation between teacher standards and ratings 
would seem unlikely. Students are discriminating in their 
ratings of teachers— they do not merely assign high ratings to 
everyone. A more logical explanation is the success story of 
the growing emphasis institutions place on teaching. Faculty 
development is a growing field— recently having gained its 
own professional journal (Journal on Centers for Teaching 
and Learning) and its own professional organization, the 
Professional and Organizational Development (POD) Network 
in Higher Education. Many centers for teaching and learning 
(CTL), which provide developmental assistance to faculty 
seeking to improve teaching, are currently thriving (Flaherty, 
2014). 

Miami University provides a telling example of the impact a 
CTL has on a campus. In 2003 the university targeted the 
top 25 introductory courses with the largest enrollments 
and implemented a course redesign project based on an 
inquiry-oriented approach to teaching. Emphasis was placed 
on active learning, peer collaboration, and critical thinking. 

In time, this approach was generalized across other courses. 
Between 2003 and 2011 significant increases occurred 
in course application of theories or concepts to practical 
problems, student peer engagement, and the amount 
of preparation students reported doing outside of class 
(Nadler, Shore, Taylor, & Bakker, 2012). The trend reported 
at Miami is not isolated. The positive linear trends across 
time in overall ratings of teaching in the IDEA database 
have been accompanied by significant and meaningful 
increases in teaching methods associated with active 
learning (Benton et al., 2015). The largest increases have 
been in collaborative learning; involvement of students in 
hands-on projects such as research, case studies, or real-life 
activities; and facilitating interactions between students of 
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diverse backgrounds and viewpoints. So the steady increase 
in ratings found in the Millennial generation is associated 
with instructor use of more student-centered approaches to 
teaching. Indeed, SRI deniers who turn their backs on the 
overwhelming and incredibly useful information generated 
by SRI (because they deny their reliability) may well be 
turning their backs on some of the most positive pedagogical 
techniques emphasized in the past couple of decades. 

Summary 

The rebuttals made thus far are in response to the most 
unreasonable indictments of student ratings. Credible 
evidence is lacking to support the views that bad teachers 
get better evaluations, having high standards leads to low 
evaluations, students are not competent to rate, SRI are 
unreliable, personal factors unrelated to learning strongly 
influence ratings, students are more motivated to respond 
out of anger than satisfaction, and Millennial are more 
punishing in their ratings than previous generations. 

Faculty Frustration with SRI Is 
Understandable 

As shown above faculty frustrations with the evaluation 
process have led to some faulty claims about SRI themselves, 
but faculty may have legitimate gripes about the process and 
have just focused on the wrong sources of real problems in 
evaluation. SRI are not the problem, but faculty should be 
rightly upset when SRI are misused and overemphasized in 
summative decisions about teaching effectiveness. One way 
ratings are misused is when evaluators make too much of too 
little (Pallett, 2006). Although SRI tend to be highly consistent 
among students in the same class and across classes for 
the same instructor (Benton et al., 2015), as with any survey 
instrument there is “noise" in the data. We should expect 
ratings collected on consecutive days to vary like blood 
pressure, not remain constant like height. Yet, administrators 
may at times make decisions about salary increases based 
on small differences in mean ratings. Student ratings are 
overemphasized when they count too much in decisions 
about salary, promotion, and tenure. The IDEA Center has 
long recommended ratings should count no more than 
30% to 50% of the overall teaching evaluation (Hoyt & 

Pallett, 1999), which is only one aspect of the “three-legged 
stool” (i.e., teaching, scholarship, and service) of a holistic 
evaluation process. Administrators who make SRI the primary 
or only measure of teaching effectiveness create mistrust 
and a breeding ground for claims of bias in student ratings. 
But such misuse should not lead to indictments of SRI 
but rather of the process. Effective use requires collecting 
multiple indicators (e.g., peer ratings, artifacts) of teaching 
effectiveness. 

We demonstrated above that students are qualified to 
provide useful reliable feedback on teacher effectiveness, but 
there are many elements of teaching about which students 
are not qualified to judge: course objectives, the instructor’s 
subject-matter knowledge, assessments and grading 


standards (for more information on students’ appropriate 
role in teaching evaluation, see Arreola, 2006, and Hoyt & 
Pallett, 1999). Evaluation processes which confuse those 
aspects of teaching about which students are qualified to 
respond and those about which student input is not useful 
can and do add to faculty frustrations. Designers of teaching 
evaluation processes need to take care in constructing 
ratings forms, limit forms to those aspects of teaching about 
which students are capable of sound judgment, and make 
SRI only a part of a larger evaluation process. Good surveys 
are difficult to design, and institutions that take on the 
burden of self-designing their own SRI do so at their own peril. 
Those institutions who employ an instrument designed by a 
committee decades ago, or worse yet allow each department 
to develop its own tool, are at risk of making decisions based 
on questionable data. 

Faculty perhaps look to blame SRI for their unhappiness 
about evaluation because the evaluation process is not 
emotionally neutral. Even when ratings are used properly, the 
process of being evaluated can create anxiety which could 
negatively affect motivation to teach as well as motivation 
to try innovative methods (Theall & Franklin, 2001). For 
example, feedback from SRI could affect teacher self-efficacy, 
the belief in one’s ability to help students succeed. Self- 
efficacy is important because students achieve more when 
teachers believe they can make a difference in their students’ 
lives (Skaalvik & Skaalvik, 2008; Ware & Kitsantas, 2007). 
When they have high self-efficacy, instructors are more willing 
to try out new teaching strategies, set high achievement 
goals for their students, put forth effort, and persist in 
teaching (Ormrod, 2014). Fear of how teaching behaviors 
could affect ratings might create reluctance to take risks in 
the classroom or to cultivate high expectations. Teachers 
may fear innovation will be “punished” by low ratings. In 
contrast, teacher self-efficacy flourishes in contexts where 
feedback draws attention to effective teaching behaviors 
and provides constructive, specific recommendations for 
improvement (Hoy, 2000). Hence IDEA’S Diagnostic Feedback 
report identifies teaching methods as either “strengths to 
retain” or ones to consider increasing. The effect of feedback 
on self-efficacy is enhanced when instructors engage in 
discussion with an administrator or peer and feel free to ask 
for help (Hoy, 2000; Finnegan, 2013). Moreover, innovation is 
more likely when instructors believe their attempts to engage 
students in new methods will be rewarded, not punished. 

Many times faculty blame the messenger (SRI) for factors 
faculty feel are beyond their control. If no remedies exist 
to help with low SRI, then one’s efforts might just as well 
be directed against the rating instrument itself. Faculty 
sometimes make faulty attributions, causal explanations to 
explain success and failure. When they attribute low ratings 
to a faulty evaluation system, bias in the ratings, or student 
incompetence, instructors are likely to become angry at the 
evaluation system itself (Hareli & Weiner, 2002). And that 
anger can intensify when faculty feel trapped. At such times 
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SRI can also either positively or negatively affect expectancy, 
beliefs about the likelihood of responding successfully, given 
our current ability level and external events that may hinder 
performance (Wigfield & Eccles, 2000). One external event 
is sufficiency of resources and support available to assist us. 
If faculty receive negative feedback about their teaching but 
lack developmental resources, they may doubt their chances 
for improvement. Again, evaluation processes which do not 
include concrete specific ways to help faculty who wish to 
improve lead to frustrations. By using multiple measures, 
rewarding innovative approaches to teaching and employing a 
holistic evaluation process, faculty irritation about ratings can 
be diminished. 

Given the above discussion we should not be surprised, 
then, that countless hours and untold dollars have been 
invested in searching for negative evidence about SRI. In 
situations where they are used improperly, count too much 
in the evaluation process, and are not supported with helpful 
resources, we can simply reiterate that such evidence is 
not negative about SRI itself but the environment in which 
they are used. The huge preponderance of evidence shows 
the utility and reliability of SRI. So, as Theall and Franklin 
(2001) point out, the time and resources devoted to trying 
to prove that SRI are biased would be better directed toward 
developing new strategies for teaching and helping students 
learn. 

Can SRI Really Be Used to Improve Teaching? 

With the continual cycle of “end-of-course evaluations," one 
is tempted to ask, “Are all these ratings really helpful?” And 
the answer is demonstrably, convincingly, “Yes, provided the 
instrument asks the right questions." Just as importantly, 
SRI’s helpfulness can be increased depending upon what we 
do with the feedback we receive. Simply reflecting upon our 
SRI can make a difference, as a meta-analysis of 17 studies 
revealed; faculty who reflect on ratings administered midterm 
improve on end-of-course evaluations (Cohen, 1980). The 
greatest gains in SRI, however, are found among instructors 
who combine student feedback with consultation involving 
a peer or faculty development specialist (Brinko, 1990; 
Burbano, 1987; Cohen, 1980; Hampton & Reiser, 2004; 
Hativa, 2013a; Knol, 2013; Marincovich, 1999; Marsh, 2007; 
Marsh & Roche, 1993; Ory & Ryan, 2001; Penny & Coe, 
2004). When combined with consultation, feedback from 
ratings can have strong effects on the instructor’s knowledge, 
focus on teaching, and plans for improvement. For SRI 
deniers to debunk the SRI themselves to justify, perhaps, 
their own low ratings is especially troubling given that SRI 
yield so much useful information that those very deniers 
could use to boost their own student ratings of teaching skills 
and of how much students feel they learned (Knol, 2013). 

Discussing ratings with a peer or consultant improves their 
usefulness, especially when the consultation addresses 
problems students have identified (Marsh & Roche, 

1993). Common areas needing improvement are student- 


teacher interactions, active learning opportunities, teacher 
expectations and standards, class preparation, and 
assessments (McGowan & Graham, 2009). 

Mulling over feedback from SRI can be humbling and 
somewhat frightening. We have to take an honest look at 
what the students are telling us. If a trusted consultant is 
unavailable, we can at least engage in self-reflection by 
comparing students’ perspectives with our own experiences 
planning and teaching the course. If the students’ views 
are substantially different from our own, we can use that 
cognitive dissonance to take action. How might we teach 
the course differently next time? Are there simple changes 
we can make to address the issues raised? Any attempt 
at improvement is far better than simply placing the SRI 
class report in a folder and ignoring it until we assemble 
our teaching portfolios, provided administrators endorse 
the attempt as part of the process of effective teaching, 
regardless of its effect on the SRI. Without encouragement 
to innovate and take risks to improve built into an evaluation 
system, faculty will be unlikely to attempt new methods. 

Beyond the course level, SRI have a place in both assuring 
the quality of teaching and improving it within an institution 
(Palermo, 2013). Therefore, institutions should use SRI to 
identify areas of strengths and weaknesses at the unit and 
institution levels. Data aggregated across departments, 
colleges, and/or curricula (e.g., general education) can 
be analyzed to identify most frequent areas needing 
improvement. Department heads can then be encouraged 
to allocate resources for faculty development focused on 
targeted teaching behaviors and learning outcomes. Students 
can be informed of programmatic and course changes 
designed to address the feedback provided on the SRI report 
(Palermo, 2013). 

Best Practices in Using SRI for Teaching 
Evaluation 

When used appropriately, SRI have a place in summative 
evaluation. Whereas formative evaluation is focused on 
improvement, summative ratings can provide indirect 
evidence of instructional effectiveness. Ratings should 
be collected from every course, but not necessarily every 
semester, so that evaluators can look for global trends 
in the data, such as steady performance, declines, or 
improvements, as well as certain courses that may need 
attention. In using SRI for personnel decisions, the following 
best practices are recommended. 

Use Multiple Measures 

Any evaluation of teaching effectiveness should incorporate 
multiple measures, such as peer ratings of course goals, 
design, and assessments; direct student outcome measures 
(e.g., creations, projects, papers); instructor self-reflections; 
and so forth (Hoyt & Pallett, 1999). When the combined data 
are in agreement, reliability increases (Cashin, 1996). Faculty 
need to be selective and strategic when they assemble 
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multiple information sources (Halonen, Dunn, McCarthy, 

& Baker, 2012). The source should have actual knowledge 
of the measure being evaluated (Arreola, 2006). Students, 
for example, have firsthand knowledge of what occurs in 
the classroom or in online discussions but are unqualified 
to render judgments about the goals and content of the 
course, methods and materials used in teaching, and student 
evaluation practices (Keig & Waggoner, 1994). 

Peer review is a credible source for evaluation of course goals 
and objectives, intellectual content, methods and materials 
used in teaching, quality and appropriateness of evaluation 
practices, and evidence of student learning. But, peer review 
improves in validity when faculty undergo some training. 
Ratings by colleagues can be unreliable and less valid when 
done by untrained observers and with an unsystematic 
approach to the evaluation (Marsh, 2007; Marsh & Dunkin, 
1997). When done in an environment of distrust due to 
a faulty evaluation process, peer review can become 
downright protective, with no true feedback that could ever 
be construed by an administrator as negative. In this worst- 
case scenario, the peer review is not geared toward teaching 
development, but is instead written to protect a fellow 
faculty member from attack. In these cases, the process 
of evaluation itself needs an overhaul so that peers again 
feel comfortable in providing meaningful critique, allowing 
colleagues to assist each other in improving their teaching. 
External recognition from outside experts can provide 
evidence that the faculty member’s teaching is exemplary. 
Nominations for a teaching award, invitations to write a 
chapter or book about teaching, and requests to speak about 
teaching practices or share course materials are examples 
that the instructor’s teaching is praiseworthy. Embedded 
assessments (i.e., student completion of class assignments 
and activities aligned with learning outcomes) signal 
accomplishment of specific learning outcomes. Examples 
include student writing samples, self-reflections on service 
learning projects, comparisons between students’ subject- 
matter knowledge before and after instruction, and student 
portfolios of completed work. Participation in professional 
development activities demonstrates the desire to improve 
when evidenced by instructor self-reflection about how the 
activity led to modifications in the course or approaches to 
teaching. Exemplary contributions to the department are 
shown when teaching large sections, taking an extra teaching 
assignment to accommodate increased enrollments or a 
sick colleague, developing curriculum and aligning it with 
accreditation standards, and helping colleagues improve their 
teaching. 

Recognize the Limitations of Each Measure 

Each source of evidence has its shortcomings. The 
limitations of SRI have already been mentioned. We must 
also be cautious when interpreting information provided by 
the instructor. Some colleagues may not be comfortable 
reflecting on their own behavior, and in some cultures 
instructors might find it inappropriate to speak highly of 


themselves. Others are exceptionally skilled at self-promotion 
and at organizing a teaching portfolio (Zakrajsek, 2006), 
which can make fair assessment sometimes difficult. 

With respect to peer review of course materials and 
embedded assessments, additional cautions must be raised. 
Faculty may differ in their philosophies and approaches to 
teaching. A faculty member who takes a teacher-centered 
approach to instruction might not be the best person to 
evaluate a colleague who employs a student-centered 
approach with active learning (Zakrajsek, 2006). For this 
reason, at least three raters should review any one person’s 
materials. Another shortcoming is that peer review also 
requires a great commitment of time for training and 
evaluation. Rewarding faculty for agreeing to evaluate their 
colleagues is important; recognizing and rewarding peer 
evaluation as important service to the department and 
college will encourage this extremely useful activity. Finally, 
“friends as peer reviewers” should be avoided, because 
ratings that are glowing across all faculty are next to useless. 

External recognition and professional development also 
have their shortcomings. Evaluators must scrutinize the 
significance of such achievements and learn as much 
as they can about awards, nominations, invitations, 
and developmental opportunities. Was the focus of the 
professional development aligned with the strategic plan 
of the institution or department? Was it truly connected to 
better teaching? 

Finally, what constitutes exemplary contributions to the 
department must be well defined. Do online courses require 
more work than teaching face to face? It probably depends 
on how often instructors participate in online discussions, 
how quickly they respond to e-mails, and whether feedback to 
students is prompt and meaningful. 

All this gathering of information in addition to SRI demands 
time from already heavily scheduled faculty, and thoughtful 
analyses of the additional information demands yet more 
time of faculty and administrators. Yet given the key role of 
excellent teaching in the successes of institutions of higher 
learning, the time and effort devoted to making the best 
teachers possible is amplyjustified. 

Weight Measures According to Reliability and Validity 

Departments should decide in advance acceptable ranges 
of weights for the multiple measures. Those that have the 
highest reliability and validity should be given the greatest 
weight. For example, because SRI represent the perceptions 
of multiple raters across multiple occasions, they should be 
weighted more heavily than a single classroom observation 
by a colleague or administrator. Embedded assessments, by 
virtue of their direct measure of student learning, may also 
warrant significant weight. 
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Control for the Influence of Non-lnstructional 
Variables 

Is it true that SRI are never biased in any way? No. Teachers 
have legitimate concerns that circumstances beyond their 
control (a bias of some sort) and unrelated to how well they 
teach could negatively influence SRI. The National Council 
on Measurement in Education (NCME), in its website 
glossary, describes bias as “systematic errors in content, 
administration, and/or scoring that can cause test takers to 
get either lower or higher scores than their true ability would 
merit." The key is that the source of the bias is irrelevant to 
the trait being measured. Applied to student ratings, “Bias 
exists when a student, teacher, or course characteristic 
affects the evaluations made, either positively or negatively, 
but is unrelated to any criteria of good teaching" (Marsh, 
2007, p. 350). 

Using Marsh’s definition, some non-teaching factors do, 
indeed, influence ratings but are not necessarily biases, 
because they may reflect either good teaching or engaged 
students. For example, IDEA SRI are positively correlated with 
students’ desire to take the course (i.e., motivation), work 
habits, and class size (Benton et al., 2015). Students who are 
more motivated, apply themselves, and are enrolled in small 
classes actually do tend to learn more— an important criterion 
of effective teaching— and, therefore, tend to assign higher 
ratings. Nonetheless, institutions should control for the 
influence of these extraneous factors. The IDEA SRI system 
statistically attempts to level the playing field among faculty 
who teach courses of varying sizes and students of diverse 
motivations and work habits. For those who do not use IDEA’S 
SRI, some attempt should be made to equitably allot teaching 
conditions to otherwise “level the playing field” by rotating 
assignments between required and elective courses, first- 
year and upper-level classes, and classes of small and large 
enrollments. 

Provide Comparative Data 

SRI tend to be negatively skewed— that is, scores are inflated. 
On a five-point scale, average global ratings of teaching 
excellence are typically around 4.0 (Benton et al., 2015). 
Moreover, they vary by academic disciplines— social science 
courses tend to earn higher ratings than math and “hard” 
sciences (Braskamp & Ory, 1994; Cashin, 1990; Centra, 
1993, 2009; Hoyt & Lee, 2002b; Sixbury & Cashin, 1995). 
How, then, can one compare a score of 4.2 in an English 
class with a 3.8 in physics? The only fair comparison is to 
express the scores in standard deviation units from the 
respective discipline’s mean. The 4.2 might be close to the 
average score in English, whereas the 3.8 could be above 
the average for physics. IDEA provides comparative scores 
by discipline to account for such differences. Institutions that 
do not use IDEA should examine whether SRI differences by 
discipline are attributable to variations in quality of teaching, 
student background preparation, or subject-matter difficulty. 

Related to this is the finding that student self-ratings of their 
background preparation (Benton et al., 2015) and subject- 


matter difficulty (Centra, 1993, 2003; Marsh, 2001; Marsh 
& Roche, 2000) are correlated with SRI. Ratings tend to be 
higher in courses where students believe they had strong 
background preparation. Difficulty, on the other hand, shows 
a nonlinear relationship: Courses are rated lower when 
they are perceived as either too difficult or too elementary, 
a phenomenon known as the Goldilocks Effect. IDEA 
controls for these student variables via its adjusted scores. 
Institutions not using IDEA’S SRI should include items that tap 
into students’ perceptions of background preparation and 
course difficulty. Alternatively, teaching assignments may be 
rotated among courses that vary in student level and difficulty 
of material. 

Include Global Items 

Global or summary items provide evaluators a view of how 
students judged the overall quality of teaching and the course 
as well as how much they learned (Braskamp & Ory, 1994; 
Cashin, 1999; Centra, 1993). IDEA’S overall ratings of the 
excellence of teaching and the course are highly correlated 
with average student ratings of progress on relevant course 
objectives (PRO). In turn, PRO is positively correlated with 
performance on course exams (Benton, Duchon, & Pallett, 
2011). Global items provide a more valid measure of overall 
impressions than does averaging several dissimilar items 
together into a single score (Cashin, 1999). 

Vary Evaluation Schedules 

How frequently and for how many courses faculty should 
administer SRI ought to depend on faculty status and the 
purpose of the evaluation (Hoyt & Pallett, 1999). For first- 
year faculty it might make sense to collect student ratings 
data for every course and section. By the time employment 
recommendations are made, at least two sets of ratings 
should have been collected for the faculty member. The same 
holds true for decisions about promotions in rank. If teaching 
is connected to merit salary recommendations, all faculty 
must have some annual SRI data available. For tenured 
faculty, student ratings might be collected for each course 
biennially. 

Use Written Comments Only Formatively 

Although student written comments correlate positively with 
quantitative global ratings of the instructor (Braskamp, Ory, 

& Pieper, 1981; Burdsal & Harrison, 2008; Ory, Braskamp, & 
Pieper, 1980), faculty generally consider them less credible 
for decisions about tenure and promotion. Their chief value 
lies in the contributions they can make to improving teaching 
or the course (Braskamp et al., 1981), unless the institution 
employs sophisticated qualitative data analyses. 

Employ Standardized Administration Procedures 

Faculty must have confidence that ratings are collected 
similarly across courses and instructors. Written instructions 
to students should be standardized. Instructors should 
leave the room because ratings tend to be higher when the 
instructor is present (Braskamp & Ory, 1994; Centra, 1993; 
Feldman, 1979; Marsh & Dunkin, 1992). Ratings should be 
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collected by a neutral party and the data taken to a location 
where they remain unavailable to the instructor until after 
grades have been submitted (Cashin, 1999). To ensure 
monitoring of procedures, students should be informed of 
policies and provided the means to report instructors who 
violate them. 

Protect Student Confidentiality 

Just as faculty must have confidence in the system, students 
must be assured their responses will remain confidential. 
Inform students that data will be held in a secure 
environment, will only be analyzed at the class level, and that 
results presented to the instructor will not be associated with 
any identifying information. 

Encourage Good Response Rates 

Concerns about low response rates have accompanied the 
increase in online survey administration. Although the online 
format may enhance standardization of administration 
procedures and better ensure student confidentiality, 
students are less likely to respond (Benton etal., 2015). 
Several constructive actions can be taken to increase the 
likelihood of a high response rate ( http://ideaedu.org/ 
support/existing-idea-paper-and-online-clients/idea-online- 

support/best-practices-for-online-response-rates ). Instructors 
can create value for student feedback by placing IDEA 
relevant objectives alongside specific course objectives in 
the syllabus, informing students about modifications made in 
the course based on previous student feedback, encouraging 
them to complete the ratings, distributing a copy of a sample 
report given to instructors, and assuring confidentiality of 
responses. Institutions can communicate reminders through 
social media, university portals, learning management 
systems, department web sites, student publications, radio, 
flyers, and posters. For online surveys the instructors can 
ask students to bring a mobile device, tablet, or laptop to 
class on a day set aside for completing the ratings. Faculty 
should ensure the accuracy of students’ “Respondent 
Identifier Labels.” Ultimately, ongoing assessment should be 
championed as part of the institutional culture of monitoring 
and ensuring program quality. 

Educate Administrators and Faculty 

Given the ramifications of decisions made from teaching 
evaluations, administrators and faculty should be educated 
about how to interpret student ratings reports. It is imperative 
they understand that SRI should be part of a holistic 
evaluation of teaching. No major decision about a faculty 
member should ever be based on a single score. Evaluators 
should also recognize that all scores have error, and the 
estimated amount of error in reported scores should be 
communicated. Time should also be set aside to review, 
with the instructor, sample reports and interpret case 
studies based on student ratings data (Cashin, 1999). And 
administrators should have in mind remedies and/or specific 
suggestions for faculty; a plan of action for underperforming 
instructors should be designed and agreed to by both parties. 


Conclusion 

Validity (utility) is not a characteristic inherent in any 
measure, be it student ratings, classroom assessments, 
or standardized tests. Validity is tied to use— in how we 
interpret ratings and the actions we take based upon those 
interpretations. When used appropriately, SRI serve several 
important purposes. They can help faculty improve their 
teaching; administrators make decisions about salaries, 
reappointments, promotions, and tenure; institutions conduct 
program reviews; and students select courses. But, they 
should never be the only information source for such decision 
making. 

Unfortunately, the valid use of ratings falls far short of its 
potential for several reasons (Pallett, 2006). Ratings tend to 
be overemphasized in summative decisions about teaching 
effectiveness. This overemphasis is easily explained. SRI 
can take little of a teacher’s and department head’s time 
compared to larger investments required in peer review, 
counseling, student interviews, accumulating other sources 
of teaching effectiveness, and so on. If a department is not 
seriously interested in improving teaching, SRI’s inherent 
efficiencies tempt some departments into relying too much 
on them. Moreover, because SRI are quantifiable, they seem 
to lend themselves to widespread application; because SRI 
yield numbers, some people believe they can be applied 
across disciplines without adjustment, for example. Also, 
because SRI are efficient relative to other measures (e.g., 
peer and administrator observations, reviews of teaching 
materials), some institutions find it easy to forego the other 
measures entirely. And, of course, the most disappointing 
failure with SRI is that, even after faculty and department 
time has been spent on administering the SRI, faculty and 
departments do not capitalize on the investment they have 
already made; users often fail to reflect on SRI feedback and 
analyze the gathered information to make improvements. 
When ratings are overemphasized for summative decisions 
and underutilized for developmental purposes, their value is, 
unfortunately, reduced. 

Our hope is that faculty, department heads, and other 
administrators who come across some of the SRI deniers’ 
statements we referenced early on in this paper will be 
convinced by the overwhelming research regarding SRIs to 
deny the deniers’ claims. SRI should count for something in 
a comprehensive faculty evaluation system. Student voices 
are critical, not only because they provide some quality 
control, but also because students have first-hand experience 
of what actually occurred in the classroom. When we take 
a comprehensive approach to evaluation of teaching, SRI 
should be one of the cornerstones of such evaluations no 
matter what the SRI deniers claim. 
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